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Abstract 

In dynamic models of infectious disease transmission, typically various mixing 
patterns arc imposed on the so-called Who- Acquircs-Infcction- From- Whom matrix 
(WAIFW). These imposed mixing patterns are based on prior knowledge of age- 
related social mixing behavior rather than observations. Alternatively, one can 
assume that transmission rates for infections transmitted predominantly through 
non-sexual social contacts, are proportional to rates of conversational contact which 
can be estimated from a contact survey. In general, however, contacts reported in 
social contact surveys are proxies of those events by which transmission may occur 
and there may exist age-specific characteristics related to susceptibility and infec- 
tiousness which are not captured by the contact rates. Therefore, in this paper, 
transmission is modeled as the product of two age-specific variables: the age-specific 
contact rate and an age-specific proportionality factor, which entails an improve- 
ment of fit for the scroprcvalcncc of the varicella-zoster virus (VZV) in Belgium. 
Furthermore, we address the impact on the estimation of the basic reproduction 
number, using non-parametric bootstrapping to account for different sources of 
variability and using multi-model inference to deal with model selection uncer- 
tainty. The proposed method makes it possible to obtain important information 
on transmission dynamics that cannot be inferred from approaches traditionally 
applied hitherto. 

Keywords: basic reproduction number, bootstrap procedure, model selection 
and averaging, social contact data, transmission parameters, WAIFW. 



1 Introduction 

A first approach in modeling transmission dynamics of infectious diseases, and more par 



ticula rly in estimating age-dependent transmission rates, was described by I Anderson and May 



(|199ll ). The idea is to impose different mixing patterns on the so-called WAIFW-matrix 



hereby constraining the number of distinct elements for identinability reasons, 
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and to estimate the parameters from ser o logica l data. Many a uthors have elaborated 
on th is approach of lAnderson and Mavl (1199 11), among which iGreenhalgh and Dietz 



(Il994h . iFarrington et al 



(2001 



and IVan Effelterre et all t()09\ ). However, estimates 



of important epidemiological parameters such as the basic reproduction number Ro 
turn out to be sensitive wit h respect to the choice of the imposed mixing pattern 
( Greenhalgh and Dietz . 19941 ). 

An alternative method was proposed by Farrington and Whitaker ( 20051 ). where 
contact rates are modeled as a continuous contact surface and estimated from serologi- 
cal data. Clearly, both methods involve a somewhat ad hoc choice, namely the structure 
for the WAIFW-matrix and the parametric model for the contact surface. A lternatively, 
to estimate age-dependent transmission parameters, Wallinga et al. ( 20061 ) augmented 
seroprevalence data with auxiliary data on self-reported numbers of conversational con- 
tacts per person, whilst assuming that transmission rates are proportional to rates of 
conversational contact. The social contact surv e ys con ducted as part of the POLY- 
MOD project (jMossong et"all l2008bl; iHens et all r2009ah. allow us to elaborate on this 



methodology presented by lWallinga et al 



(200 



1). 



The paper is organized as follows. In the next section, we outline the buildup of the 
Belgian social contact survey and the information available for each contact. Further, we 
briefly explain the epidemiological characteristics of VZV and the serological data from 
Belgium we use. In Section 3, we illustrate the traditional approach of imposing mixing 
patterns to estimate the WAIFW-matrix from this serological data set. In Section 4, 
a transition is made to the novel approach of using social contact data to estimate 
i?o- We show that a bivariate smoothing approach allows for a more flexible and 
better esti mate of the cont a ct sur face compared to the maximum likelihood estimation 
method of Wallinga et al.1 (|200fih . Further, some refinements are proposed, among 
which an elicitation of contacts with high transmission potential and a non-parametric 
bootstrap appr oach, assessing s ampling variability and accounting for age uncertainty, 
as sugg ested bv lHalloranl <|200d ). 

Our main result is the novel method of disentangling the WAIFW-matrix into two 
components: the contact surface and an age-dependent proportionality factor. The 
proposed method, as described in Section 5, tackles two dimensions of uncertainty. 
First, by estimating the contact surface from data on social contacts, we overcome the 
problem of choosing a completely parametric model for the WAIFW-matrix. Second, to 
overcome the problem of model selection for the age-dependent proportionality factor, 
concepts of multi-model inference are applied and a model averaged estimate for Rq is 
calculated. Some concluding remarks are provided in the last section. 



2 Data 

2.1 Belgian contact survey 

Several small scale surveys were made in order to gain more insight in socia l mix- 



ing behavior relevant to the spread of close contact infections (Edmunds et all 11 997 



Beutels et all l200fil : lEdmunds et~aTl l200fil : IWallinga et all 120061 : iMikolajczyk and Kretzschmarl , 



20081 ). In order to refine on contact information, a large multi-country p opulation-based 



survey was conducted in Europe as part of the POLYMOD project (jMossong et al 



2008b|). 

In Belgium, this survey was conducted in a period from March until May 2006. 
A total of 750 participants, selected through random digit dialing, completed a diary- 
based questionnaire about their social contacts during one randomly assigned weekday 
and one randomly assigned day in the weekend (not always in that order). In this 
paper, we follow the sam pling scheme of the PO LYMOD project and only consider one 
day for each participant ( Mossong et al. . 2008bl ). The data set consists of participant- 



related information such as age and gender, and details about each contact: age and 
gender of the contacted person, and location, duration and frequency of the contact. In 
case the exact age of the contacted person was unknown, participants had to provide 
an estimated age range and the mean value is used as a surrogate. Further, a distinc- 
tion between two types of contacts was made: non-close contacts, defined as two-way 
conversations of at least three words in each others proximity, and close contacts that 
involve any sort of physical skin-to-skin touching. 

Teenagers (9-17y) filled in a simplified version of the diary and were closely followed 
up to anticipate interpretation problems. For children (< 9y), a parent or exceptionally 
another adult caregiver filled in the diary. One adult respondent made over 1000 con- 
tacts and was considered an outlier to the data set. This person is likely very influential 
and therefore excluded from the analyses presented here. Analyses are based on the 
remaining 749 participants. Using census data on population sizes of different age by 
household size combinations, weights are given to the participants in order to make the 
data representative of the Belgian population. In total, the 749 participants recorded 
12775 contacts of which 3 are omitted from analysis due to missing age values for the 
contacted person. For a more in depth perspective on the Belgian contact survey and 



the imp ortance of contact rates on modeling infectious diseases, we refer to lHens et al 



(2009 



np< 



2.2 Serological data 

Primary infection with VZV, also known as human herpes virus 3 (HHV-3), results 
in varicella, commonly known as chickenpox, and mainly occurs in childhood. After- 
wards, the virus becomes dormant in the body and may reactivate in a later stage, 
resulting in herpes zoster, commonly known as shingles. Infection with VZV occurs 
through direct or aerosol contact with infected persons. A perso n infected with chick 



enpox is a ble to transmit the virus for abo ut 7 days. Following iGarnett and Grenfell 



(|l992l ) and lWhitaker and Farringtonl (|2004h , we ignore chickenpox cases resulting from 



contact with persons suffering from shingles. Zoster indeed has a limited impact on 
trans mission dynamic s when considering large populations with no immunization pro- 
gram ( Ferguson et al. . 19961 ). 



In a period from November 2001 until March 2003, 2655 serum samples in Belgium 
were collected and tested for VZV. Together with the test results, gender and age of 
the individuals were recorded. In the data set, age ranges from to 40 years and 6 
individuals are younger than 6 months. Belgium has no mass vaccination program for 
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VZV F urther details on the data set can be found in lHens et al.l (120081 ) and 
(|2009bh . 



Hens et al 



3 Estimation of Rq by imposing mixing patterns 

3.1 Estimating transmission rates 

To describe transmission dynamics, a compartmental MSIR-model for a closed popula- 
tion of size iV is considered. By doing so, we explicitly take into account the fact that, 
in a first phase, newborns are protected by maternal antibodies and do not take part 
in the transmission process. We assume that mortality due to infection can be ignored, 
which is plausible for VZV in developed countries, and that infected individuals main- 
tain lifelong immunity after recovery. Further, demographic and endemic equilibrium 
are assumed, which means that the age-specific population sizes remain constant over 
time and that the disease is in an endemic steady state at the population level. For 
simplicity, we assume type I mortality defined as 



exp (- n{s)ds \ = | 



1, if a < L 
0, if a > L, 



where fj,(a) denotes the age-specific mortality rate. This implies that everyone survives 
up to age L and then promptly dies, which is a reaso nable assumption when describin g 
transmission dynamics for VZV in Belgium (see also Whitaker and Farrington . 20041 ) . 



We make a similar assumption for the age-specific rate 7(a) of losing maternal anti- 
bodies, which we will denote as 'type I maternal antibodies': 



a 







meaning that all newborns are protected by maternal antibodies until a certain age A 
and then move to the susceptible class instantaneously. Under these assumptions, the 
proportion of susceptibles is given by 



A 



x(a) = exp - / X(s)ds , if a > A, (2) 



where A(a) denotes the age-specific force of infection, and x(a) = if a < A. 

If the mean duration of infectiousness D is short compared to the timescale on 
whi ch transmission and morta lity rate vary, the force of infection can be approximated 



by (lAnderson and May! . Il99ll ) : 



A(a) = — / P(a,a')\(a')x(a')da', (3) 
L J a 

where /3(a,a') denotes the transmission rate i.e. the per capita rate at which an indi- 
vidual of age a' makes an effective contact with a person of age a, per year. Formula © 
reflects the so-called 'mass action principle', which implicitly assumes that infectious 
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and susceptible individuals mix completely with each other and move randomly within 
the population. 

Estimating transmission rates using seroprevalence data can not be done analytically 
since the integral equation ([3]) in general has no closed form solution. However, it is 
possible to solve this numerically by turning to a discrete age framework, assuming a 
constant force of infection in each age-class. Denote the first age interval (am, apl) and 



the jth age interval [ay], ay+W), j = 2, . . . , J, where am = A and a 



L. Making 



use of (HI) , the preva l ence of immune individuals of age a is now well approximated by 
([Anderson and Mavl . Il99ll ): 



j'-i 



it (a) = 1 - exp - ^2 A fc(°[fc+i 



k=l 



l [k]j 



Xj(a - a {j] ) 



(4) 



if a belongs to the jth age interval. Note that we allow the prevalence of immune 
individuals to vary continuously with age and that we do not summarize the binary 
seroprevalence outcomes into a proportion per age class. Further, the force of infection 
for age class % equals {% = 1, . . . , J): 



ND 



3=1 



exp 



k=l 



a [k]) 



exp 



E A *( c 



Hkh 



k=l 



(5) 



where denotes the per capita rate at which an individual of age class j makes an 
effective contact with a person of age class i, per year. The transmission rates make 
up a J x J matrix, the so-called WAIFW-matrix. 

Once the WAIFW-matrix is estimated, following Diekmann et al. ( 1990l ) and Farrington et al 



(|200ll ). the basic reproduction number Rq can be calculated as the dominant eigenvalue 
of the J x J next generation matrix with elements (i, j = 1, . . . , J): 



ND 



i a [i+l] - a [i]) P; 



'J- 



(6) 



Rq represents the number of secondary cases produced by a typical infected person 
during his or her entire period of infectiousness, when introduced into an entirely sus- 
ceptible population with the exception of newborns who are passively immune through 
maternal antibodies. In the next section, we illustrate the traditional approach of 
imposing mixing patterns to estimate the WAIFW-matrix from seroprevalence data. 



3.2 Imposing mixing patterns 

The traditional approach of Anderson and May ( 199ll ) imposes different, somewhat ad 
hoc, mixing patterns on the WAIFW-matrix. Note that, in the previous section, we 
ended up with a system of J equations with J x J unknown parameters ([5]) and thus 
restrictions on these patterns are necessary. Among the proposals in the literature, 
one distinguishes between several mixing assumptions such as homogeneous mixing 
(P(a,a') = (3), proportional mixing (3 u : (3(a,a') = u(a)u(a')), separable mixing 
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(] it,« : 0(a,a') = u(a)v(a')) and symmetry (0(a,a') = f3(a',a)). Note that the latter 
two mixing assumptio n s requ ire a dditional restrictions to be made. As illustrated by 
Greenhalgh and Diet d (jl994h and IVan Effelterre et al.1 (j2009h . the structure imposed 
on the WAIFW-matrix has a high impact on the estimate of Rq. In this section, we 
assum e the transmission ra t es to b e constant within six d iscret e age classes (J = 6 ). We 
follow lAnderson and Mavl (|l99lh : IVan Effelterre et al.1 (|2009h : lOeuniimi etail (|2009h 
and consider the following mixing patterns, based on prior knowledge of social mixing 
behavior, to model the WAIFW-matrix for VZV: 
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In order to estimate the transmission parameters = {0-\ , . . . , 0e,) T from sero- 
prevalence data, we fol l ow an iterative procedure from Farrington et al. ( 200ll ) and 
kanaan and Farrin g to^ B . First, one assumes plausible starting values for and 
solves ([5]) iteratively for the piecewise constant force of infection A = (X\, . . . , Xq) t , 
which in its turn can be contrasted to the serology. Second, this procedure is repeated 
under the constraint j3 > 0, until the Bernoulli loglikelihood 



^{yilog[vr(ai)] + (1 - y { ) log[l - Tr(ai)]}, 



has been maximized. Here, n denotes the size of the serological data set, yi denotes 
a binary variable indicating whether subject % had experienced infection before age en 
and the prevalence 7r(aj) is obtained from (|4]). 



3.3 Application to the data 

For the remainder of the paper, the following paramet e rs, spe cific for Belgium anno 2003 
dEurostatJ . 120071 : IfOD Economie Afdeling Statistieki . l200fih . are kept constant when 
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estimating the WAIFW-matrix and Rq: size of the population aged to 80 years, 
N = 9943749, and life expectancy at birth, L = 80. The mean duration of infectious- 
ness for VZV is taken D = 7/365. Type I mortality and type I maternal antibodies 
with age A = 0.5, are assumed. Removing individuals younger than 6 months, the size 
of the serological data set becomes n = 2649. 

In this application, the population is divided into six age classes takin g into ac- 
count the schooling system in Belgium, following IVan Effelterre et all (j2009l ): (0.5,2), 
[2,6), [6,12), [12,19), [19,31), [31,80). The last age class has a wide range because 
the serological data set only contains information for individuals up till 40 years. 
The following ML-estimate for A is obtained assuming a piecewise constant force 
of infection and using constrained optimization to ensure monotonicity (vr'(a) > 0): 

A ML = (0.313, 0.304, 0.246,0, 0.082, 0) T . A graphical display of the fit is presented in 
Figure [1] and a dashed line is used to indicate the estimated prevalence and force of 
infection for the age interval [40, 80) which lacks serological information. 




Figure 1: Estimated prevalence (upper curve) and force of infection (lower curve) for VZV assuming a 
piecewise constant force of infection. The dots represent the observed serological data with 
size proportional to the corresponding sample size. The dashed lines are used to indicate 
the estimated prevalence and force of infection for the age interval [40,80), which lacks 
serological information. 



During the estimation process, non-identifiability problems occur for mixing pat- 
terns Wi, W5 and Wq, which is related to the fact that X^ L = = 0. Therefore, 
these mixing patterns are left from further consideration. For the remaining three, 
ML-estimates for j3 and Rq are presented in Table [TJ Note that mixing pattern W4 
has a regular configuration for the data, whereas W2 and W3 are non-regular since un - 
constrained ML-estimation induces negative estimates for (3^ ( Farrington et al. . 200ll ). 



The estimate of Rq ranges from 3.37 to 4.21. A 95% bootstrap-based percentile con- 
fidence interval for Rq is presented as well, applying a non-parametric bootstrap by 
taking B = 1000 samples with replacement from the serological data. The fit of the 
three mixing patterns can be compared using model selection criteria, such as AIC and 
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Table 1: Estimates for the transmission parameters (multiplied by 10 4 ) and for Ro, obtained by 
imposing mixing patterns W2, W3 and W<t on the WAIF W- matrix. 





k 


h 


h 


k 


h 


h 


Ro 


95% CI for R 


AIC 


w 2 


1.413 


1.335 


1.064 


0.000 


0.343 


0.000 


3.51 


[3.07, 13.42] 


1372.819 


w 3 


1.362 


1.441 


0.873 


0.000 


0.343 


0.000 


3.37 


[2.81, 13.38] 


1372.819 


w 4 


1.334 


1.298 


1.049 


0.000 


0.349 


0.000 


4.21 


[3.69, 13.13] 


1372.756 



BIC (jSchwara . Il978l ). As can be seen from Table [H the AIC- values (equivalent to BIC 
here) are virtually equal and do not provide any basis to guide the choice of a mixing 
pattern. 



No te that these results differ somewhat from those obtained by I Van Effelterre et al. 
(|2009l ). where a different data set for VZV serology was used, collected from a large 



laboratory in the city of Antwerp between October 1999 and April 2000. 



4 Estimation of Rq using data on social contacts 

4.1 Constant proportionality of the transmission rates 

In the previous section, we have illustrated some caveats involved in the traditional 
approach of imposing mixing patterns on the WAIFW-matrix. In general, the choice 
of the structures as well as the choice of the age classes are somewhat ad hoc. Since 
evidence for mixing patterns is thought to be found in social contact data, i.e. gov- 
erning contacts with high transmission potential, an alternative approach to estimate 
transmission para meters has emer g ed: a ugmenting seroprevalence data with data on 
social contacts. In IWallinga et al. (|200fih . it was argued that f3(a,a') is proportional 



to c(a,a'), the per capita rate at which an individual of age a' makes contact with a 
person of age a, per year: 

j3{a,a')=q-c{a,a). (8) 

We will refer to this assumption as the 'constant proportionality' assumption, since 
q represents a constant disease-specific factor. Translating this assumption into the 
discrete framework with age classes (a[i],0[2]), [fl[2]> a [3])i • • • > [ a [j] > a [j+i])> is straightfor- 
ward = 1, . . . , J): 0ij = q ■ Cij, where Cjj denotes the per capita rate at which an 
individual of age class j makes contact with a person of age class i, per year. 

The proportionality factor and the contact rates are not identifiable from serological 
data only. Therefore, in order to estimate the WAIFW-matrix, one first needs to 
estimate the contact rates using social contact data. Following the Belgian contact 
survey, 'making contact with' is then defined as a two-way conversation of at least 
three words in each others proximity and/or any sort of physical skin-to-skin touching 
(Section l2.ip . In Section r4.3.H we will refine on this definition and consider specific types 
of contact with high transmission potential. In a second step, keeping the estimated 
contact rates fixed, we estimate the proportionality factor from serological data using 
the estimation method described in Section [3.21 
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4.2 Estimating contact and transmission rates 



Consider the random variable Yij, i.e. the number of contacts in age class j during 
one day as reported by a respondent in age class i = 1, . . . , J), which has observed 
values Vijj, t = 1, . . . , Tj, where Tj denotes the number of participants in the contact 
survey belonging to age class i. Now define m« = E(Yij), i.e. the mean number of 
contacts in age class j during one day as reported by a respondent in age class i. The 
elements rriij make up a J x J matrix, which is called the 'social contact matrix'. Now, 
the contact rates Cij are related to the social contact matrix as follows: 

rriji 

c% j — 365 * 5 

Wi 

where Wi denotes the population size in age class i, obtained from demographical data. 
When estimating the s ocial contact matrix, the reciprocal nature of contacts needs to 
be taken into account ( Wallinga et al. . 20061 ): 



rriijWi = rrijiWj , (9) 

which means that the total number of contacts from age class i to age class j must 
equal the total number of contacts from age class j to age class i. 

4.2.1 Bivariate smoothing 

The elements rriij of the social contact matrix are estimated from the contact data 



using a bivariate smoothing approach as described by Wood d200fil V In contrast with 



the maximum likelihood approach as presented by IWallinga et al. r (|2006h . the average 
number of contacts is modeled as a two-dimensional continuous function over age of 
respondent and contact, giving rise to a 'contact surface'. The basis is a tensor-product 
spline derived from two smooth functions of the respondent's and contact's age, ensuring 
flexibility: 

K K 

~ NegBin(mij, k), where giniij) = ^ 8i p bi(a^)d p (a^), (10) 

i=\ p=i 

where g is some link function, 8i p are unknown parameters, and be and d p are known 
basis functions for the marginal smoothers. To allow for overdispersion, we assume that 
the contact counts Y^ are independently negative binomial distributed with mean m«, 
dispersion parameter k and variance rriij + ntfj/k. 

The basis dimension, K, should be chosen large enough in order t o fit the dat a 
well, but small enough to maintain reasonable computational efficiency ( Wood . 20061 ). 



For tensor-product smoothers, the upper limit of the degrees of freedom is given by 
the product of the K values provided for each marginal smooth, minus one, for the 
identifiability constraint. However, the actual effective degrees of freedom are also 
controlled by the degree of penalization selected during fitting. 

Thin plate regression splines are used to avoid the selection of knots and a log link 
is used in model (fTU|) . Diary weights, as discussed in Section |2~H are taken into account 
in the smoothing process . By applying a smooth-then-constrain-approach as proposed 



by lMammen et al.l (120011 ). the reciprocal nature of contacts ([9]) is taken into account. 
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4.2.2 Estimating the contact rates 

The smoothing; is performed in R with the gam function from the mgcv package ( Woodl . 
2006), considering one year age intervals, [0, 1), [1, 2), . . . , [100, 101). An informal check 



(by comparing the estimated degrees of freedom and the basis dimension) shows that 
K = 11 is a satisfactory basis dimension choice for the Belgian contact data. In Fig- 
ure [21 the estimated contact surface obtained with the bivariate smoothing approach, 
is displayed. The smoothing approach seems well able to capture important features of 
human contacting behavior. Three components clearly arise in the smoothed contact 
surface. First of all, one can see a pronounced assortative structure on the diago- 
nal, representing high contact rates between individuals of the same age. Second, an 
off-diagonal parent-child component comes forward, reflecting a very natural form of 
contact between parents and children, whic h might be important in modeling certain 



childhood infections such as parvovirus B19 (jMossong et al. 



2008ah . Finally, there 



even 



seems to be evidence for a grandparent-grandchild component. 





20 40 60 

age participant 



Figure 2: Perspective (left) and image (right) plot of the estimated contact rates dj obtained with 
bivariate smoothing. The X- and F-axis represent age of the respondent and age of the 
contact, respectively. 



Except for the assortativeness, these features are not reflected by the c ontact rates, 



estima ted by maximizing the likelihood of the 'saturated model' proposed by lWallinga et al 
(|2006l ). considering the same six age classes used in Section 13.31 (results omitted here) . 



Furthermore, AIC an d BIC criteria indicate the smoothing method to outperform 
Wallinga et all l|200d )'B saturated model, showing improved estimation of the contact 



surface using nonpar ametric techniques. 



4.2.3 Estimating R 

Under the constant proportionality assumption ([8]), we are now able to estimate the 
WAIFW-matrix for VZV using serological data. Keeping the estimated contact rates Cij 
fixed, we estimate the proportionality factor q using the estimation method described 
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Table 2: ML-estimates for the propor tionality factor and R p, obtained from contact rates estimated 
by bivariate smoothing and IWallinga et al.l (|2006l Vs saturated model, assuming constant 
proportionality. 



Model for Cy 


Q 


95% CI for q 


Ro 


95% CI for #0 


AIC 


Smoothing 
Saturated 


0.132 
0.124 


[0.124, 0.140] 
[0.117, 0.132] 


15.69 
14.08 


[14.74, 16.69] 
[13.26, 14.94] 


1386.618 
1377.146 



in Section 13.21 In Table [21 estimates for q and Rq together with their correspond- 
ing 95% profile likelihood confidence intervals, and AlC-values, are pr esented for the 



bivari ate smoothing approach and the 'saturated model' proposed by IWallinga et al 



(120061 k The results are fairly similar, though the saturated model induces a smaller 
AlC-value compared to the smoothing approach. As can be seen from both model fits 
in Figure El contact rate estimates between children will mainly determine the fit to 
the serological data, limiting the advantage of a better contact surface estimate. Note 
that the 95% confidence intervals in Table [2] are implausibly narrow, resulting from the 
fact that the estimated contact rates are held constant. 




20 40 60 80 20 40 60 80 



Figure 3: Estimated prevalence (upper curve) and force of i nfection (lower curve) obtained from con- 
tact rates estimated using maximum likelihood for lWallinga et all (|200rJ )'s saturated model 
(left) and using bivariate smoothing (right). 



4.3 Refinements to the social contact data approach 

The aim is to clearly disentangle the WAIFW-matrix into the contact process and 
the transmission potential. Therefore, in the following, contact rates are estimated 
using a bivariate smoothing approach, since this method outperforms the sat urated 
model estimated usin g maximum likel i hood as pr oposed by Wallinga et al. ( 20061 ) (Sec- 
tion H22J). Following TOgunjimi et all (j2009l ) and [Melee aro et al.l (|2009l ). contacts with 
high transmission potential are filtered from the social contact data. Further, to im- 
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prove statistical inference, we present a non-parametric bootstrap approach, explicitly 
accounting for all sources of variability. 



4.3.1 Contacts with high transmission potential 

The aim is to trace the type of contact which is most likely to be responsible for VZV 
transmission, hereby exploiting the following details provided on each contact: duration 
and type of contact, which is either close or non-close (Section l2.ip . Five types of contact 
are considered and we will explore which one induces the best fit to the serological data. 
First, the contact rates c(a, a') are estimated using the complete contact data set as we 
did in Section 14.2.31 and further, four speci fic types of contact wi th hi gh transmission 
poten tial for VZV are selected according to Ogunjimi et al. ( 20091 ) and Melegaro et al. 



(2009): 



Model Parameter Type of contact 



C\ q\ all contacts 

C2 <?2 close contacts 

C3 q-s close contacts > 15 minutes 

C4 (fe close contacts and non-close contacts > 1 hour 

C5 §5 close contacts > 15 minutes and non-close contacts > 1 hour 

Assuming constant proportionality, maximum likelihood estimates for the trans- 
mission parameters q^ (k = 1,...,5) and for the basic reproduction number Rq to- 
gether with their corresponding 95% profile likelihood confidence intervals (first en- 
try), are presented in Table [3j For each model the AIC- value, AIC difference 
Afc = AlCfc — AIC m i n , Akaike weight 

exp(-iA fc ) 



E ex P(~2 A ^ 

t 



and e vidence ratio (ER) w m i n /wk, are calculated following iBurnham and Anderson 



(|2002l ). where AIC m i n and w m - m correspond to the model with the smallest AIC value. 
Recall that the AIC is an estimate of the expected, relative Kullback-Leibler (K-L) dis- 
tance, whereas the K-L distance embodies the information lost when an approximating 
model is used instead of the unknown, true model. A given Akaike weight is consid- 
ered as the weight of evidence in favor of a model k being the actual K-L best model 
for the situation at hand, given the data and the set of candidate models considered. 

According to the AlC-criterion, although AIC differences are minor, the contact 
matrix consisting of close contacts longer than 15 minutes (model C3) implies the best 
fit to the seroprevalence data. A graphical representation of the estimated prevalence 
and force of infection is omitted here, since the result is very close to the one obtained 
for model C\ in Figure EJ Further, there is evidence for model C5 as well, having 
an Akaike weight of 0.329 and an evidence ratio of 1.7. The latter model adds non- 
close contacts longer than one hour to model C3 and therefore these models are closely 
related. 



12 



Table 3: ML-estimates for the proportionality factor and Ro, 95% profile likelihood confidence in- 
tervals (first entry), 95% bootstrap-based percentile confidence intervals (second entry) and 
several measures related to model selection, obtained from contact rates estimated using 
bivariate smoothing, considering different types of contact C1-C5, assuming constant pro- 
portionality. 



Model 


qk 


95% CI for q k 


Rq 


95% CI for Rq 


AIC 






ER 


C x 


0.132 


[0.124, 0.140] 


15.69 


[14.74, 16.69] 


1386.618 


11.660 


0.002 


340.4 






[0.103, 0.175] 




[12.34, 21.41] 










c 2 


0.160 


[0.150, 0.169] 


10.24 


[9.65, 10.85] 


1379.581 


4.623 


0.057 


10.1 






[0.126, 0.208] 




[8.21, 13.68] 










c 3 


0.173 


[0.163, 0.184] 


8.68 


[8.18, 9.20] 


1374.958 


0.000 


0.574 


1.0 






[0.133, 0.221] 




[6.89, 11.34] 










c 4 


0.145 


[0.136, 0.154] 


11.73 


[11.05, 12.47] 


1380.354 


5.396 


0.039 


14.9 






[0.113, 0.188] 




[9.41, 15.95] 










c 5 


0.156 


[0.147, 0.166] 


10.40 


[9.79, 11.04] 


1376.068 


1.110 


0.329 


1.7 






[0.119, 0.204] 




[8.05, 14.10] 











4.3.2 Non-parametric bootstrap 

We explicitly acknowledge that up till now, by keeping the estimated contact rates 
fixed, we have ignored the variability originating from the contact data. In order to 
assess sampling variability for the social contact data and the serological data alto- 
gether, we will use a non-parametric bootstrap approach. Furthermore, building in a 
randomization process, uncertainty concerning age is accounted for. After all, in the 
social contact data, ages of respondents are rounded up, which is also the case for some 
individuals in the serological data set. Concerning the age of contacts, a lower and up- 
per age limit is given by the respondents. Instead of using the mean value of these age 
limits, a random draw is now taken from the uniform distribution on the corresponding 
age interval. In summary, each bootstrap cycle consists of the following six steps: 

1. randomize ages in the social contact data and the serological data set; 

2. take a sample with replacement from the respondents in the social contact data; 

3. recalculate diary weights based on age and household size of the selected respon- 
dents; 

4. estimate the social contact matrix (smooth-then-constrain approach); 

5. take a sample with replacement from the serological data; 

6. estimate the transmission parameters and Rq. 

This bootstrap approach allows one to calculate bootstrap confidence intervals for the 
transmission parameters and for the basic reproduction number, which take into ac- 
count all sources of variability. 

The impact on statistical inference is now illustrated for the models considered in 
the previous section. Nine hundred bootstrap samples are taken from the contact data 
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and from the serological data simultaneously, while ages are being randomized. Merely 
B = 587 bootstrap samples lead to convergence in all five smoothing procedures, which 
might be induced by the sparse structure of the contact data. However, by individual 
monitoring of non-converging gam functions, convergence was reached after all and a 
comparison of the bootstrap results showed little difference whether or not these samples 
were included. 95% percentile confidence intervals for q and Rq are calculated based 
on the B = 587 bootstrap samples (see Table El second entry). Taking into account 
sampling variability for the social contact data has a noticeable impact, as can be seen 
from the wider 95% confidence intervals. 



5 Age-dependent proportionality of the transmission rates 

The proportionality factor q might depend on several characteristics related to suscep- 
tibility and infectiousness, which could be ethnic-, climate-, disease- or age-specific. 
Examples of age-specific characteristics related to susceptibility and infectiousness in- 
clude the mean infectious period, mucus secretion and hygiene. In the situation of 
seasonal and pande mic influenza this has bee n est ablished and used in realistic simula- 
tion models (see e.g. Cauchemez et al. ( 2004! ) and Longini et al. ( 2005 )). Furthermore, 



the conversational and physical contacts reported in the diaries serve as proxies of those 
events by which an infection can be transmitted. For example, sitting close to someone 
in a bus without actually touching each other, may also lead to transmission of infec- 
tion. In light of these discrepancies, q can be considered as an age-specific adjustment 
factor which relates the true contact rates underlying infectious disease transmission to 
the social contact proxies. 

In view of this, we will explore whether q varies with age, an assumption we will 
refer to as 'age-dependent proportionality': 

(3(a,a') = q(a,a f ) ■ c(a,a), (11) 

which in the discrete framework turns into: (3ij = qij ■ Cij = 1,... , J). In the 
previous section, it was observed that, under the constant proportionality assumption, 
close contacts longer than 15 minutes imply the best fit to the serological data for VZV. 
Therefore in the following, the contact rate is modeled using close contacts longer than 
15 minutes and we will elaborate on this particular model by assuming age dependence. 
First, discrete structures are applied in order to model q ge-dependent proportio- 

nality factor and second, 'continuous' loglinear regression models are considered for the 
same purpose. Finally, we assess the level of model selection uncertainty and calculate 
a model averaged estimate for the basic reproduction number. 



5.1 Discrete structures 

The proportionality factor gy is now allowed to differ between age classes. Discrete 
matrix structures, involving two transmission parameters 71 and 72, are explored in 
modeling q^. Five models are considered, which fit the following structures for q^ to 
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the seroprevalence data: 



M 1 = 11 , M 2 = I 11 I , M s 

V 72 72 y 



7i 


7i 


72 


72 


7i 


72 


7i 


72 



71 72 

72 7i 



The population is divided into two age classes, namely [0.5, 12) and [12,80), a choice 
based on the dichotomy of the population according to the schooling system in Bel- 
gium (Section 13, 3D . yielding the smallest AlC-value. Note that higher order extensions, 
considering more parameters and/or number of age classes, were fitted to the serolog- 
ical data as well. The improvement in loglikelihood, however, does not outweigh the 
increase in the number of transmission parameters. 

Notice that the structures of M1-M5 resemble the mixing patterns imposed on the 
WAIFW- matrix in the traditional Anderson and May ( 199ll ) approach. We would like 



to emphasize that the method proposed here differs greatly from the latter, since the 
WAIFW- matrix is now estimated using the estimated contact rat es: = qij ■ dij. 
Hence, in contrast with the approach of lAnderson and Mavl (|199ll ) who estimate 



by fixing the structure of the mixing pattern, in our approach we estimate the contact 
pattern from the survey data and use several proportionality structures to select the 
best model from which the (5ij are estimated. 

Table H] displays ML-estimates for 71, 72 and the basic reproduction number Rq, 
together with their corresponding 95% percentile confidence intervals (B = 603 boot- 
strap samples converged out of 700). For model M4, 72 is non-identifiable, and uncon- 
strained optimization of model M5 would not lead to convergence. According to the 
AlC-criterion, the remaining models fit equally well and are informative with respect to 
VZV transmission dynamics. Most likely, this is due to the fact that the main transmis- 
sion routes for VZV are between children and from infectious children to susceptible 
adults, embodied by the first column (7i,72) T - The three models result in approxi- 
mately the same estimates for 71 and 72 and consequently the differences in AIC are 
only minor. 

It is clear from Table[Uthat we estimate a difference in transmissibility between those 
younger and older than 12 years (about 0.18 and 0.07, respectively). This difference 
cannot be solely explained by the estimated contact rates. A possible explanation is that 
when infectious children make close contact with susceptible children during a sufficient 
amount of time, the probability of effective VZV transmission is higher compared to 
the same situation with susceptible adults. Another potential cause is underreporting 
of contacts between children. After all, up to the age of eight, the co ntact diaries 



were fi lled in by the parents, which may have induced some reporting bias (jHens et al 



2009a|). 



5.2 Continuous modeling 

As opposed to the previous, the proportionality factor q(a,a') is now allowed to vary 
continuously over age. Loglinear regression models are considered for q(a,a'), since we 
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Table 4: Candidate models for the proportionality factor together with ML-estimates for the trans- 
mission parameters and Ro, 95% bootstrap-based percentile confidence intervals, and several 
measures related to model selection. 



Model 


Parameter 


95% CI 


1 L() 


95% CT for 7?n 


K 


AIC 


At 




ER 


c 3 


H 


0.173 


[0.133, 0.221] 


8.68 


[6.89, 11.34] 


1 


1374.958 


8.884 


0.003 


84.9 


Mi 


'Yl 


0.185 


[0.136, 0.244] 


4.79 


[4.15, 9.98] 


2 


1366.306 


0.232 


0.261 


1.1 




12 


0.079 


[0.006, 0.196] 
















M 2 


7i 


0.183 


[0.138, 0.240] 


5.37 


[4.47, 9.68] 


2 


1366.285 


0.211 


0.264 


1.1 




72 


0.078 


[0.006, 0.187] 
















M 3 


71 


0.185 


[0.136, 0.Z44J 


a. lb 


[D.OZ, ll.ZOj 


z 


1 OCC ("1*7 A 


n nnn 
U.000 


0.Z93 


i n 
l.U 




12 


0.069 


[0.006, 0.199] 
















M 6 


7o 


-1.622 


[-2.028, -1.212] 


5.79 


[4.63, 12.60] 


2 


1368.709 


2.635 


0.079 


3.7 




7i 


-0.023 


[-0.067, 0.016] 
















M 7 


7o 


-1.720 


[-2.441, -1.182] 


5.03 


[4.20, 1318.68] 


3 


1368.325 


2.251 


0.095 


3.1 




7i 


0.014 


[-0.086, 0.305] 


















72 


-0.002 


[-0.024, 0.001] 
















M 8 


7o 


-1.517 


[-2.224, -0.446] 


3.55 


[1.76, 159.96] 


2 


1374.324 


8.250 


0.005 


61.9 




7i 


-0.065 


[-0.403, 0.064] 

















expect an exponential decline of q over a due to hygienic habits as well as an exponential 
decline of q over a' due to decreasing mucus secretion. The following loglinear models 
are fitted to the data: 



M 6 


log{<?(a)} = 


7o + 7ia, 


My 


log{<?(a)} = 


7o + 71a + 72a 2 , 


M 8 


log{q(a')} = 


7o + lia', 


M 9 


log{g(a')} = 


7o + 7i a ' + 72 (a') 2 


Mio 


log{g(a,a')} = 


7o + 7i a + 72 a'. 



Model M6 models g as a first degree function of age of the susceptible and model 
M-j allows for an additional quadratic effect of age, a 2 . Models Mg and Mg are the 
analogue of Mq and Mj for age of the infectious person, a'. Finally, M\q models q 
as an exponential function of a and a' simultaneously. For model Mg, no convergence 
was obtained and model Mio gives rise to an estimated proportionality factor which is 
exponentially increasing over a', inducing unrealistically large estimates for q at older 
ages. 

Maximum likelihood estimates for the model parameters and the basic reproduction 
number Rq are presented in Table [U together with the corresponding 95% percentile 
confidence intervals (B = 603 bootstrap samples converged out of 700). According to 
the AlC-criterion, Mq and M7 fit equally well. Allowing the proportionality factor to 
vary by age of infectious persons, does not seem to substantially improve model fit, as 
can be seen by comparing the AlC-values of C3 and M$. 

Clearly for models M7 and Ms, the upper limits of the confidence intervals for 
Rq are very large, as a consequence of estimated proportionality factors which are 
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exponentially increasing over a and a', respectively. This result originates from two 
things: first, there is lack of serological information for individuals aged 40 and older, 
and second, VZV is highly prevalent in the population and most individuals become 
infected with VZV before the age of ten. Mathematically the latter means that from a 
certain age on, vr(a) ~ 1 and vr'(a) ~ 0, leading to an indeterminate force of infection 
A (a) = 7r'(a)/{l — vr(a)}. In Section \5A\ we assess the sensitivity of the results to the 
former issue, repeating all analyses using simulated serological data for the age range 
[40,80). 

Figure U] displays the estimated prevalence function and force of infection for the 
discrete model M3 (left) and the continuous model M7 (right). The results are re- 
markably similar. The effect of making q age-dependent is visualized by comparing 
Figure H] to the fit of model C%, which was very close to model C3, in Figure [3] (on 
the right). The models assuming age-dependent proportionality estimate an initially 
higher force of infection and a steeper decrease from the age of ten, after which the 
force of infection is reduced by a factor two, compared to the constant proportionality 
model. While the latter model predicts total immunity for VZV at older ages, the age- 
dependent proportionality models estimate a fraction of seropositives which is below 
one at all times. 




s 



Figure 4: Estimated prevalence (upper curve) and force of infection (lower curve) for the discrete 
model M3 (left) and the continuous model M7 (right). 



5.3 Model selection and multi-model inference 

Table U] presents all candidate models for the proportionality factor q we have collected 
up till now, among which the constant proportionality model C3, the discrete age- 
dependent proportionality models Mi, M2 and M3, and the continuous age-dependent 
proportionality models Mg, Mj and Ms. Further for each model, the number of param- 
eters K, the AlC-value, the AIC difference the Akaike weight and the evidence 
ratio (ER) are displayed. 

Model M3 with an assortative component 71 and a background component 72 is 
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the 'best' model for q according to the AlC-criterion. However, model selection un- 
certainty is likely to be high since the se lected best model has an Akaike weight of 
only 0.293 (jBurnham and Anderson . 120021 ). The evidence ratios for M3 versus M\ and 



M2 are both 1.1, which means there is weak support for the best model. If many 
independent samples could be drawn, the three discrete age-dependent models would 
probably compete each other for the 'best' model position. The continuous models Mq 
and M-j have evidence ratios around 3.5, indicating that these models also contribute 
some information. Models C3 and Mg have the largest AIC difference a very small 
Akaike weight (< 0.005) and very large evidence ratios (84.9 and 61.9 respectively), 
which means there is little support for these two models. 

Since there is no single model in the candidate set that is clearly superior to the 
others and since the estimate for the basic reproduction number Rq varies noticeably 
over the candidate models, we are not inclined to bas e prediction only on M3. Apply- 
ing the concepts of model averaging, as described in iBurnham and Anderson! ( 20021 ). 
a weighted estimate of Rq is calculated, based on the model estimates and the corre- 
sponding Akaike weights: 

_ 7 

k=l 

With the bootstrap procedure, we obtain a 95% percentile confidence interval for this 
model averaged estimate Rq, namely [4.4,351.6]. Again, there is a large upper limit 
induced by the same issues reported in Section 15.21 



5.4 Sensitivity analysis 

In order to assess the lack-of-data-problem, we simulate serological data for the age 
range [40, 80) using a constant prevalence tt = 0.983, which is estimated from a thin 
plate regression spline model for the original serological data. Sample sizes for one- 
year age groups are chosen according to the Belgian population distribution in 2003 
(|FOD Economie Afdeling StatistiekL l2006h and the total size of serological data now 



amounts to n = 3856. The seven candidate models for the proportionality factor q are 
now applied to the 'complete' serological data set. 

The results are presented in Table [5] and are, overall, quite similar to the results 
obtained before (Table SJ. The 95% percentile confidence intervals for Rq (B = 599 
bootstrap samples converged out of 700), however, are narrower since the simulated data 
for the age range [40, 80) are 'forcing' the proportionality factor q to follow a natural 
pace. This is illustrated for model M7 in Figure where the estimated function q(a) 
is depicted for 100 randomly chosen bootstrap samples. Particularly, right confidence 
interval limits for Rq are smaller, whereas for most models the Rq estimate seems to 
have decreased just a little bit. 

Model selection uncertainty is illustrated quite nicely here, since four models, M7, 
M3, M2 and M\, have Akaike weights close to 0.24 and these models also had the most 
support for the original data set (Table Hj). The model averaged estimate Rq now equals 
5.64 and the 95% bootstrap-based percentile confidence interval is [4.7, 7.5]. 
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Table 5: Candidate models for the proportionality factor applied to the serological data set augmented 
with simulated data, together with ML-estimates for the transmission parameters and -Ro, 
95% bootstrap-based percentile confidence intervals, and several measures related to model 
selection. 



Model 


Parameter 


95% CI 




95% CI for Rq 


K 


AIC 


A fe 




ER 


c 3 


q 


0.159 


[0.126, 0.195] 


7.98 


[6.60, 10.19] 


1 


1618.747 


70.774 


< 0.0001 


> 10 3 


Mi 


J 1 


0.189 


[0.137, 0.250] 


4.20 


[3.88, 5.74] 


2 


1548.714 


0.741 


0.201 


1.4 






0.052 


[0.021, 0.095] 
















M 2 


71 


0.186 


[0.136, 0.247] 


4.74 


[4.36, 6.07] 


2 


1548.627 


0.654 


0.210 


1.4 




72 


0.052 


[U.uzu. u.uyij 
















M 3 


71 


0.189 


[0.137, 0.250] 


8.28 


[6.43, 11.52] 


2 


1548.344 


0.371 


0.242 


1.2 




72 


0.044 


[0.016, 0.082] 
















M 6 


7o 


-1.561 


[-1.934, -1.120] 


4.96 


[4.47, 6.54] 


2 


1551.321 


3.348 


0.055 


5.3 




7i 


-0.035 


[-0.067, -0.014] 
















M 7 


7o 


-1.793 


[-2.247, -1.079] 


5.22 


[4.60, 7.51] 


3 


1547.973 





0.292 


1.0 




7i 


0.030 


[-0.074, 0.126] 


















72 


-0.002 


[-0.006, 0.001] 


















7o 


-1.458 


[-2.061, -0.844] 


2.69 


[2.08, 12.97] 


2 


1610.113 


62.140 


<C 0.0001 


> 10 3 




7i 


-0.103 


[-0.254, 0.016] 

















6 Concluding remarks 

In this paper, an overview of different estimation methods for infectious disease param- 
eters from data on social contacts and serological status, was given. The theoretical 
framework included a compartmental MSIR-model, taking into acc ount the presence of 



mater nal antibodies, and the mass action principle, as presented by I Anderson and May 



(|199ll ). An important assumption made was the one of endemic equilibrium, which 
means that infection dynamics are in a steady state. The serological data set we 
used was collected over 17 months, averaging over potential epidemic cycles of VZV 
in Belgium during that period. In Section 3, we have illustrated the traditional, basic 
approach of imposing mixing patterns on the WAIFW-matrix to estimate transmission 
parameters from serological data. In contrast, the novel approach of using social con- 
tact data to estimate infectious disease parameters, avoids the choice of a parametric 
model for the entire WAIFW-matrix. 

The idea is fairly simple: transmission rates for infections that are transmitted from 
person to person in a non-sexual way, such as VZV, are assumed to be proportional to 
rates of making conversational and/or physical contact, which can be estimated from 
contact surveys. Although more time consuming, the bivariate smoothing approach as 
proposed in Section 4, was better able to capture important features of human contact 



i ng be havior, compared to the maximum likelihood estimation method of lWallinga et al. 



(2006). However, when a non-parametric bootstrap approach was applied to take into 



account sampling variability, convergence problems arose, probably due to the large 
number of zeros in combination with the log-link. Therefore, a mixture of Poisson dis- 
tributions or a zero-inflated negative binomial distribution could be more appropriate. 
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Figure 5: q(a) estimates for model Mr, shown for 100 randomly chosen bootstrap samples from the 
original serological data (left) and from this data augmented with simulated data for [40, 80) 
(right). 



Further, in Section 4, we dealt with a couple of challenges posed by Halloran ( 20061 ). 
The social contact survey contained useful additional information on the contact itself, 
which allowed us to target very specific types of contact with high transmission po- 
tential for VZV. Furthermore, a non-parametric bootstrap approach was proposed to 
improve statistical inference. 

The constant proportionality assumption was relaxed in Section 5 and we have 
shown that an improvement of fit could be obtained by disentangling the transmission 
rates into a product of two age-specific variables: the age-specific contact rate and 
an age-specific proportionality factor. The latter may reflect, for instance, differences 
in characteristics related to susceptibility and infectiousness or discrepancies between 
the social contact proxies measured in the contact survey and the true contact rates 
underlying infectious disease transmission. We would like to emphasize that there 
probably exist other models for q(a, a') than the ones considered in Section 5, which fit 
the data even better. Our choice of a set of plausible candidate models was directed 
by parsimony on the one hand, limiting the total number of parameters to three, and 
prior knowledge on the other hand, considering loglinear models. Furthermore, we 
restricted analyses to close contacts lasting longer than 15 minutes, which means that 
close contacts of short duration and non-close contacts are assumed not to contribute 
to transmission of VZV. 

It is important to note that different assumptions concerning the underlying type of 
contact as well as different parametric models for q(a,a'), are likely to entail different 
estimates of Rq, however, they may still induce a similar fit to the serological data. 
In order to deal with this problem of model selection uncertainty we have turned to 
multi-model inference in Section [5. 31 In Figure [6l estimates of Ro are presented for the 
main estimation methods considered in this paper: the traditional method of imposing 
mixing patterns to the WAIFW-matrix (W4) and the method of using data on social 
contacts, assuming constant proportionality (the saturated model SA, C\ and C3) and 
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age-dependent proportionality (Mi, M2 and M3). There is a pronounced variability in 
the estimates of Rq, which is partially captured by the model averaged estimate MA, 
calculated from Table [H 
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Figure 6: Ro estimates for mixing pattern Wa, applied to the serological data in Section 13.31 and 
for the following models using social contact data: the saturated model (SA) as proposed 
bv lWallinea et all (|2006h . applied in Section 14.2.31 assuming constant proportionality, and 
further bivariate smoothing models: constant proportionality models Ci and C3 considering 
all and close contacts longer than 15 minutes, respectively (Section l4.3.ip and discrete age- 
dependent proportionality models Mi, M2 and M3 (Section 15. ip . The model averaged 
estimates for Ro calculated from Table [3] (MA), based on the original serological data, and 
from Table [5] (MA), based on the serological data set augmented with simulated data, are 
displayed, as well as 95% bootstrap-based percentile confidence interval limits for the latter: 
[MAl, MAr]. 



When estimating q(a, a'), we were actually faced with three problems of indetermi- 
nacy. First, there is lack of serological information for individuals aged 40 and older, 
second, prevalence of VZV rapidly stagnates, leading to an indeterminate force of infec- 
tion and third, serological surveys do not provide information related to infectiousness. 
Models which only expressed age differences in q for infectious individuals, such as the 
discrete model M5 (Section l5.ip and the continuous models Mg and Mg (Section I5.2p . 
either did not lead to convergence or induced unrealistically large bootstrap estimates 
for q at older ages. 

A sensitivity analysis in Section 15.41 showed that lack of serological data had a big 
impact on confidence intervals for Rq. We simulated data for the age range [40,80), 
giving rise to a model averaged estimate MA as displayed in Figure[6]with corresponding 
confidence interval limits [MA^, MA#]. The latter problems of indeterminacy might be 
controlled by combining information on the same infection over different countries or 
on different airborne infections, assuming there is a relation between the country- or 
disease-specific q(a,a'), respectively. This strategy already appeared beneficial when 
estimating Ro di r ectly from seroprevalence data, without using social contact data 



dFarrington et all l200ll ). 



Further, the impact of intervention strategies such as school closures, might be in- 
vestigated by incorporating transmission parameters, estimated from data on social 
contacts and serological status, in an age-time-dynamical setting. Finally, it is impor- 
tant to note that the models rely on the assumptions of type I mortality and type I 
maternal antibodies in order to facilitate calculations. Consequently, model improve- 
ments could be made through a more realistic approach of demographical dynamics. 
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